In this session, I will give an introduction to Bioconductor and how to perform single-cell analysis using Bioconductor packages. This session introduces the concept of “object-oriented” programming - or at least how to understand it in R. We will use the SingleCellExperiment class object as an example - however the skills learned in this session can be easily applied to other high-dimensional data analysis tasks. The SingleCellExperiment class is a so called S4 class object and the prefered object type in Bioconductor. Check out https://adv-r.hadley.nz/oo.html for in depth explanations on opbject-oriented programming in R.
Why do we want to use the SingleCellExperiment object?
.rds filesSingleCellExperiment object - you don’t need to implement functions yourselfBioconductor provides tools for the analysis and comprehension of high-throughput genomic data. Bioconductor uses the R statistical programming language, and is open source and open development. It has two releases each year, and an active user community.
Check out the https://bioconductor.org/ website.
Bioconductor version 3.10 offers 1823 software packages, 957 Annotation packages, 385 ExperimentData packages and 27 Workflows.
Bioconductor packages can be conveniently installed using BiocManager:
#install.packages("BiocManager")
#BiocManager::install("SingleCellExperiment")
# Should be '3.10'
BiocManager::version()
[1] ‘3.10’
# Should be true
BiocManager::valid()
Registered S3 method overwritten by 'dplyr':
method from
print.rowwise_df
[1] TRUE
I prefer using the BiocManager::install since it checks if there are outdated packages - important to include bug fixes. The BiocManager::install function also allows you to install CRAN and Github packages.
#BiocManager::install("igraph")
#BiocManager::install("BodenmillerGroup/Rphenograph")
Bioconductor packages are released twice a year - once in April/May, once in October. Unless you are developing Bioconductor packages, you won’t need to use Bioconductor devel. But here are more information: Bioc devel
SingleCellExperiment classHere, I will use the SingleCellExperiment class object as an example for object-oriented data analysis in R. Other widely used objects are SummarizedExperiment containers, from which the SingleCellExperiment class inherits.
To work with the object, I will mostly follow the Orchestrating Single-Cell Analysis with Bioconductor workflow, an excellent resource to do single-cell data analysis in R using Bioconductor.
Of note: The workflow was written for single-cell RNA sequencing data and some concepts (e.g. normalization) do not apply to CyTOF data analysis.
Here, we will start with the data analysis infrastructure section of the OSCA workflow.
We will first read in the data that we want to analyse. For convenience, I stored the raw expression counts (mean pixel intensity per cell) in one .csv file, the marker-specific metadata in one .csv file and the cell-specific metadata in one .csv file.
# Read in counts
pancreas_counts <- read.csv("~/Github/IntroDataAnalysis/Data/pancreas_counts.csv",
stringsAsFactors = FALSE, row.names = 1)
head(pancreas_counts)
dim(pancreas_counts)
[1] 12274 38
# Read in cell metadata
cell_meta <- read.csv("~/Github/IntroDataAnalysis/Data/pancreas_cellmeta.csv",
stringsAsFactors = FALSE, row.names = 1)
head(cell_meta)
# Read in marker metadata
marker_meta <- read.csv("~/Github/IntroDataAnalysis/Data/pancreas_markermeta.csv",
stringsAsFactors = FALSE, row.names = 1)
head(marker_meta)
In most cases, it is safe to compare the order of rows and columns:
identical(rownames(pancreas_counts), rownames(cell_meta))
[1] TRUE
identical(colnames(pancreas_counts), rownames(marker_meta))
[1] FALSE
# Check why the colnames and rownames don't match
colnames(pancreas_counts)
[1] "H3" "SMA" "INS" "CD38" "CD44" "PCSK2" "CD99" "CD68" "MPO" "SLC2A1" "CD20" "AMY2A" "CD3e" "PPY"
[15] "PIN" "PD.1" "GCG" "PDX1" "SST" "SYP" "KRT19" "CD45" "FOXP3" "CD45RA" "CD8a" "CA9" "IAPP" "KI.67"
[29] "NKX6.1" "pH3" "CD4" "CD31" "CDH" "PTPRN" "pRB" "cPARP1" "Ir191" "Ir193"
rownames(marker_meta)
[1] "H3" "SMA" "INS" "CD38" "CD44" "PCSK2" "CD99" "CD68" "MPO" "SLC2A1" "CD20" "AMY2A" "CD3e" "PPY"
[15] "PIN" "PD-1" "GCG" "PDX1" "SST" "SYP" "KRT19" "CD45" "FOXP3" "CD45RA" "CD8a" "CA9" "IAPP" "KI-67"
[29] "NKX6-1" "pH3" "CD4" "CD31" "CDH" "PTPRN" "pRB" "cPARP1" "Ir191" "Ir193"
# Fix colnames
colnames(pancreas_counts) <- gsub("\\.", "-", colnames(pancreas_counts))
identical(colnames(pancreas_counts), rownames(marker_meta))
[1] TRUE
In general, it is recommended not to use spaces or special characters when labelling markers or cells.
SingleCellExperiment objectAs you can imagine, it would be a bit annoying to handle three different objects side-by-side. That’s why we can store everything in one single container: the SingleCellExperiment object.
library(SingleCellExperiment)
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: ‘BiocGenerics’
The following objects are masked from ‘package:parallel’:
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from ‘package:stats’:
IQR, mad, sd, var, xtabs
The following objects are masked from ‘package:base’:
anyDuplicated, append, as.data.frame, basename, cbind, colnames, dirname, do.call, duplicated, eval, evalq, Filter,
Find, get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget, order, paste, pmax, pmax.int,
pmin, pmin.int, Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply, union, unique,
unsplit, which, which.max, which.min
Loading required package: S4Vectors
Attaching package: ‘S4Vectors’
The following object is masked from ‘package:base’:
expand.grid
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: DelayedArray
Loading required package: matrixStats
Attaching package: ‘matrixStats’
The following objects are masked from ‘package:Biobase’:
anyMissing, rowMedians
Loading required package: BiocParallel
Attaching package: ‘DelayedArray’
The following objects are masked from ‘package:matrixStats’:
colMaxs, colMins, colRanges, rowMaxs, rowMins, rowRanges
The following objects are masked from ‘package:base’:
aperm, apply, rowsum
sce <- SingleCellExperiment(assays = list(counts = t(pancreas_counts)), colData = cell_meta)
sce
class: SingleCellExperiment
dim: 38 12274
metadata(0):
assays(1): counts
rownames(38): H3 SMA ... Ir191 Ir193
rowData names(0):
colnames(12274): E34_1 E34_2 ... J02_4952 J02_4953
colData names(21): slide id ... Age Gender
reducedDimNames(0):
spikeNames(0):
altExpNames(0):
colnames(sce)
[1] "E34_1" "E34_2" "E34_3" "E34_4" "E34_5" "E34_6" "E34_7" "E34_8" "E34_9" "E34_10" "E34_11"
[12] "E34_12" "E34_13" "E34_14" "E34_15" "E34_16" "E34_17" "E34_18" "E34_19" "E34_20" "E34_21" "E34_22"
[23] "E34_23" "E34_24" "E34_25" "E34_26" "E34_27" "E34_28" "E34_29" "E34_30" "E34_31" "E34_32" "E34_33"
[34] "E34_34" "E34_35" "E34_36" "E34_37" "E34_38" "E34_39" "E34_40" "E34_41" "E34_42" "E34_43" "E34_44"
[45] "E34_45" "E34_46" "E34_47" "E34_48" "E34_49" "E34_50" "E34_51" "E34_52" "E34_53" "E34_54" "E34_55"
[56] "E34_56" "E34_57" "E34_58" "E34_59" "E34_60" "E34_61" "E34_62" "E34_63" "E34_64" "E34_65" "E34_66"
[67] "E34_67" "E34_68" "E34_69" "E34_70" "E34_71" "E34_72" "E34_73" "E34_74" "E34_75" "E34_76" "E34_77"
[78] "E34_78" "E34_79" "E34_80" "E34_81" "E34_82" "E34_83" "E34_84" "E34_85" "E34_86" "E34_87" "E34_88"
[89] "E34_89" "E34_90" "E34_91" "E34_92" "E34_93" "E34_94" "E34_95" "E34_96" "E34_97" "E34_98" "E34_99"
[100] "E34_100" "E34_101" "E34_102" "E34_103" "E34_104" "E34_105" "E34_106" "E34_107" "E34_108" "E34_109" "E34_110"
[111] "E34_111" "E34_112" "E34_113" "E34_114" "E34_115" "E34_116" "E34_117" "E34_118" "E34_119" "E34_120" "E34_121"
[122] "E34_122" "E34_123" "E34_124" "E34_125" "E34_126" "E34_127" "E34_128" "E34_129" "E34_130" "E34_131" "E34_132"
[133] "E34_133" "E34_134" "E34_135" "E34_136" "E34_137" "E34_138" "E34_139" "E34_140" "E34_141" "E34_142" "E34_143"
[144] "E34_144" "E34_145" "E34_146" "E34_147" "E34_148" "E34_149" "E34_150" "E34_151" "E34_152" "E34_153" "E34_154"
[155] "E34_155" "E34_156" "E34_157" "E34_158" "E34_159" "E34_160" "E34_161" "E34_162" "E34_163" "E34_164" "E34_165"
[166] "E34_166" "E34_167" "E34_168" "E34_169" "E34_170" "E34_171" "E34_172" "E34_173" "E34_174" "E34_175" "E34_176"
[177] "E34_177" "E34_178" "E34_179" "E34_180" "E34_181" "E34_182" "E34_183" "E34_184" "E34_185" "E34_186" "E34_187"
[188] "E34_188" "E34_189" "E34_190" "E34_191" "E34_192" "E34_193" "E34_194" "E34_195" "E34_196" "E34_197" "E34_198"
[199] "E34_199" "E34_200" "E34_201" "E34_202" "E34_203" "E34_204" "E34_205" "E34_206" "E34_207" "E34_208" "E34_209"
[210] "E34_210" "E34_211" "E34_212" "E34_213" "E34_214" "E34_215" "E34_216" "E34_217" "E34_218" "E34_219" "E34_220"
[221] "E34_221" "E34_222" "E34_223" "E34_224" "E34_225" "E34_226" "E34_227" "E34_228" "E34_229" "E34_230" "E34_231"
[232] "E34_232" "E34_233" "E34_234" "E34_235" "E34_236" "E34_237" "E34_238" "E34_239" "E34_240" "E34_241" "E34_242"
[243] "E34_243" "E34_244" "E34_245" "E34_246" "E34_247" "E34_248" "E34_249" "E34_250" "E34_251" "E34_252" "E34_253"
[254] "E34_254" "E34_255" "E34_256" "E34_257" "E34_258" "E34_259" "E34_260" "E34_261" "E34_262" "E34_263" "E34_264"
[265] "E34_265" "E34_266" "E34_267" "E34_268" "E34_269" "E34_270" "E34_271" "E34_272" "E34_273" "E34_274" "E34_275"
[276] "E34_276" "E34_277" "E34_278" "E34_279" "E34_280" "E34_281" "E34_282" "E34_283" "E34_284" "E34_285" "E34_286"
[287] "E34_287" "E34_288" "E34_289" "E34_290" "E34_291" "E34_292" "E34_293" "E34_294" "E34_295" "E34_296" "E34_297"
[298] "E34_298" "E34_299" "E34_300" "E34_301" "E34_302" "E34_303" "E34_304" "E34_305" "E34_306" "E34_307" "E34_308"
[309] "E34_309" "E34_310" "E34_311" "E34_312" "E34_313" "E34_314" "E34_315" "E34_316" "E34_317" "E34_318" "E34_319"
[320] "E34_320" "E34_321" "E34_322" "E34_323" "E34_324" "E34_325" "E34_326" "E34_327" "E34_328" "E34_329" "E34_330"
[331] "E34_331" "E34_332" "E34_333" "E34_334" "E34_335" "E34_336" "E34_337" "E34_338" "E34_339" "E34_340" "E34_341"
[342] "E34_342" "E34_343" "E34_344" "E34_345" "E34_346" "E34_347" "E34_348" "E34_349" "E34_350" "E34_351" "E34_352"
[353] "E34_353" "E34_354" "E34_355" "E34_356" "E34_357" "E34_358" "E34_359" "E34_360" "E34_361" "E34_362" "E34_363"
[364] "E34_364" "E34_365" "E34_366" "E34_367" "E34_368" "E34_369" "E34_370" "E34_371" "E34_372" "E34_373" "E34_374"
[375] "E34_375" "E34_376" "E34_377" "E34_378" "E34_379" "E34_380" "E34_381" "E34_382" "E34_383" "E34_384" "E34_385"
[386] "E34_386" "E34_387" "E34_388" "E34_389" "E34_390" "E34_391" "E34_392" "E34_393" "E34_394" "E34_395" "E34_396"
[397] "E34_397" "E34_398" "E34_399" "E34_400" "E34_401" "E34_402" "E34_403" "E34_404" "E34_405" "E34_406" "E34_407"
[408] "E34_408" "E34_409" "E34_410" "E34_411" "E34_412" "E34_413" "E34_414" "E34_415" "E34_416" "E34_417" "E34_418"
[419] "E34_419" "E34_420" "E34_421" "E34_422" "E34_423" "E34_424" "E34_425" "E34_426" "E34_427" "E34_428" "E34_429"
[430] "E34_430" "E34_431" "E34_432" "E34_433" "E34_434" "E34_435" "E34_436" "E34_437" "E34_438" "E34_439" "E34_440"
[441] "E34_441" "E34_442" "E34_443" "E34_444" "E34_445" "E34_446" "E34_447" "E34_448" "E34_449" "E34_450" "E34_451"
[452] "E34_452" "E34_453" "E34_454" "E34_455" "E34_456" "E34_457" "E34_458" "E34_459" "E34_460" "E34_461" "E34_462"
[463] "E34_463" "E34_464" "E34_465" "E34_466" "E34_467" "E34_468" "E34_469" "E34_470" "E34_471" "E34_472" "E34_473"
[474] "E34_474" "E34_475" "E34_476" "E34_477" "E34_478" "E34_479" "E34_480" "E34_481" "E34_482" "E34_483" "E34_484"
[485] "E34_485" "E34_486" "E34_487" "E34_488" "E34_489" "E34_490" "E34_491" "E34_492" "E34_493" "E34_494" "E34_495"
[496] "E34_496" "E34_497" "E34_498" "E34_499" "E34_500" "E34_501" "E34_502" "E34_503" "E34_504" "E34_505" "E34_506"
[507] "E34_507" "E34_508" "E34_509" "E34_510" "E34_511" "E34_512" "E34_513" "E34_514" "E34_515" "E34_516" "E34_517"
[518] "E34_518" "E34_519" "E34_520" "E34_521" "E34_522" "E34_523" "E34_524" "E34_525" "E34_526" "E34_527" "E34_528"
[529] "E34_529" "E34_530" "E34_531" "E34_532" "E34_533" "E34_534" "E34_535" "E34_536" "E34_537" "E34_538" "E34_539"
[540] "E34_540" "E34_541" "E34_542" "E34_543" "E34_544" "E34_545" "E34_546" "E34_547" "E34_548" "E34_549" "E34_550"
[551] "E34_551" "E34_552" "E34_553" "E34_554" "E34_555" "E34_556" "E34_557" "E34_558" "E34_559" "E34_560" "E34_561"
[562] "E34_562" "E34_563" "E34_564" "E34_565" "E34_566" "E34_567" "E34_568" "E34_569" "E34_570" "E34_571" "E34_572"
[573] "E34_573" "E34_574" "E34_575" "E34_576" "E34_577" "E34_578" "E34_579" "E34_580" "E34_581" "E34_582" "E34_583"
[584] "E34_584" "E34_585" "E34_586" "E34_587" "E34_588" "E34_589" "E34_590" "E34_591" "E34_592" "E34_593" "E34_594"
[595] "E34_595" "E34_596" "E34_597" "E34_598" "E34_599" "E34_600" "E34_601" "E34_602" "E34_603" "E34_604" "E34_605"
[606] "E34_606" "E34_607" "E34_608" "E34_609" "E34_610" "E34_611" "E34_612" "E34_613" "E34_614" "E34_615" "E34_616"
[617] "E34_617" "E34_618" "E34_619" "E34_620" "E34_621" "E34_622" "E34_623" "E34_624" "E34_625" "E34_626" "E34_627"
[628] "E34_628" "E34_629" "E34_630" "E34_631" "E34_632" "E34_633" "E34_634" "E34_635" "E34_636" "E34_637" "E34_638"
[639] "E34_639" "E34_640" "E34_641" "E34_642" "E34_643" "E34_644" "E34_645" "E34_646" "E34_647" "E34_648" "E34_649"
[650] "E34_650" "E34_651" "E34_652" "E34_653" "E34_654" "E34_655" "E34_656" "E34_657" "E34_658" "E34_659" "E34_660"
[661] "E34_661" "E34_662" "E34_663" "E34_664" "E34_665" "E34_666" "E34_667" "E34_668" "E34_669" "E34_670" "E34_671"
[672] "E34_672" "E34_673" "E34_674" "E34_675" "E34_676" "E34_677" "E34_678" "E34_679" "E34_680" "E34_681" "E34_682"
[683] "E34_683" "E34_684" "E34_685" "E34_686" "E34_687" "E34_688" "E34_689" "E34_690" "E34_691" "E34_692" "E34_693"
[694] "E34_694" "E34_695" "E34_696" "E34_697" "E34_698" "E34_699" "E34_700" "E34_701" "E34_702" "E34_703" "E34_704"
[705] "E34_705" "E34_706" "E34_707" "E34_708" "E34_709" "E34_710" "E34_711" "E34_712" "E34_713" "E34_714" "E34_715"
[716] "E34_716" "E34_717" "E34_718" "E34_719" "E34_720" "E34_721" "E34_722" "E34_723" "E34_724" "E34_725" "E34_726"
[727] "E34_727" "E34_728" "E34_729" "E34_730" "E34_731" "E34_732" "E34_733" "E34_734" "E34_735" "E34_736" "E34_737"
[738] "E34_738" "E34_739" "E34_740" "E34_741" "E34_742" "E34_743" "E34_744" "E34_745" "E34_746" "E34_747" "E34_748"
[749] "E34_749" "E34_750" "E34_751" "E34_752" "E34_753" "E34_754" "E34_755" "E34_756" "E34_757" "E34_758" "E34_759"
[760] "E34_760" "E34_761" "E34_762" "E34_763" "E34_764" "E34_765" "E34_766" "E34_767" "E34_768" "E34_769" "E34_770"
[771] "E34_771" "E34_772" "E34_773" "E34_774" "E34_775" "E34_776" "E34_777" "E34_778" "E34_779" "E34_780" "E34_781"
[782] "E34_782" "E34_783" "E34_784" "E34_785" "E34_786" "E34_787" "E34_788" "E34_789" "E34_790" "E34_791" "E34_792"
[793] "E34_793" "E34_794" "E34_795" "E34_796" "E34_797" "E34_798" "E34_799" "E34_800" "E34_801" "E34_802" "E34_803"
[804] "E34_804" "E34_805" "E34_806" "E34_807" "E34_808" "E34_809" "E34_810" "E34_811" "E34_812" "E34_813" "E34_814"
[815] "E34_815" "E34_816" "E34_817" "E34_818" "E34_819" "E34_820" "E34_821" "E34_822" "E34_823" "E34_824" "E34_825"
[826] "E34_826" "E34_827" "E34_828" "E34_829" "E34_830" "E34_831" "E34_832" "E34_833" "E34_834" "E34_835" "E34_836"
[837] "E34_837" "E34_838" "E34_839" "E34_840" "E34_841" "E34_842" "E34_843" "E34_844" "E34_845" "E34_846" "E34_847"
[848] "E34_848" "E34_849" "E34_850" "E34_851" "E34_852" "E34_853" "E34_854" "E34_855" "E34_856" "E34_857" "E34_858"
[859] "E34_859" "E34_860" "E34_861" "E34_862" "E34_863" "E34_864" "E34_865" "E34_866" "E34_867" "E34_868" "E34_869"
[870] "E34_870" "E34_871" "E34_872" "E34_873" "E34_874" "E34_875" "E34_876" "E34_877" "E34_878" "E34_879" "E34_880"
[881] "E34_881" "E34_882" "E34_883" "E34_884" "E34_885" "E34_886" "E34_887" "E34_888" "E34_889" "E34_890" "E34_891"
[892] "E34_892" "E34_893" "E34_894" "E34_895" "E34_896" "E34_897" "E34_898" "E34_899" "E34_900" "E34_901" "E34_902"
[903] "E34_903" "E34_904" "E34_905" "E34_906" "E34_907" "E34_908" "E34_909" "E34_910" "E34_911" "E34_912" "E34_913"
[914] "E34_914" "E34_915" "E34_916" "E34_917" "E34_918" "E34_919" "E34_920" "E34_921" "E34_922" "E34_923" "E34_924"
[925] "E34_925" "E34_926" "E34_927" "E34_928" "E34_929" "E34_930" "E34_931" "E34_932" "E34_933" "E34_934" "E34_935"
[936] "E34_936" "E34_937" "E34_938" "E34_939" "E34_940" "E34_941" "E34_942" "E34_943" "E34_944" "E34_945" "E34_946"
[947] "E34_947" "E34_948" "E34_949" "E34_950" "E34_951" "E34_952" "E34_953" "E34_954" "E34_955" "E34_956" "E34_957"
[958] "E34_958" "E34_959" "E34_960" "E34_961" "E34_962" "E34_963" "E34_964" "E34_965" "E34_966" "E34_967" "E34_968"
[969] "E34_969" "E34_970" "E34_971" "E34_972" "E34_973" "E34_974" "E34_975" "E34_976" "E34_977" "E34_978" "E34_979"
[980] "E34_980" "E34_981" "E34_982" "E34_983" "E34_984" "E34_985" "E34_986" "E34_987" "E34_988" "E34_989" "E34_990"
[991] "E34_991" "E34_992" "E34_993" "E34_994" "E34_995" "E34_996" "E34_997" "E34_998" "E34_999" "E34_1000"
[ reached getOption("max.print") -- omitted 11274 entries ]
rownames(sce)
[1] "H3" "SMA" "INS" "CD38" "CD44" "PCSK2" "CD99" "CD68" "MPO" "SLC2A1" "CD20" "AMY2A" "CD3e" "PPY"
[15] "PIN" "PD-1" "GCG" "PDX1" "SST" "SYP" "KRT19" "CD45" "FOXP3" "CD45RA" "CD8a" "CA9" "IAPP" "KI-67"
[29] "NKX6-1" "pH3" "CD4" "CD31" "CDH" "PTPRN" "pRB" "cPARP1" "Ir191" "Ir193"
dim(sce)
[1] 38 12274
ncol(sce)
[1] 12274
nrow(sce)
[1] 38
The SingleCellExperiment stores cells in the columns and markers in rows.
We can now also store the cell- and marker-specific metadata in the SCE object. These need to be DataFrame objects - a Bioconductor-specific class similar to tibble, data.table and data.frame.
library(S4Vectors)
colData(sce) <- DataFrame(cell_meta)
rowData(sce) <- DataFrame(marker_meta)
sce
class: SingleCellExperiment
dim: 38 12274
metadata(0):
assays(1): counts
rownames(38): H3 SMA ... Ir191 Ir193
rowData names(3): channel metal target
colnames(12274): E34_1 E34_2 ... J02_4952 J02_4953
colData names(21): slide id ... Age Gender
reducedDimNames(0):
spikeNames(0):
altExpNames(0):
We have now successfully created a SingleCellExperiment!
We have now stored the raw counts in the SCE object.
assays(sce)
List of length 1
names(1): counts
assayNames(sce)
[1] "counts"
dim(counts(sce))
[1] 38 12274
For dimensionality reduction and clustering, it is often preferred to work with distributions that are normal-like. We can now also store transformed and scaled counts in the same object:
assay(sce, "exprs") <- asinh(counts(sce) / 1)
assay(sce, "scaled") <- t( scale( t( assay(sce, "exprs") ) ) )
rowMeans(assay(sce, "scaled"))
H3 SMA INS CD38 CD44 PCSK2 CD99 CD68 MPO
3.363427e-16 7.002946e-17 -1.186238e-16 -6.976842e-17 2.113397e-17 -1.480132e-16 9.103553e-17 -8.606244e-17 4.342104e-17
SLC2A1 CD20 AMY2A CD3e PPY PIN PD-1 GCG PDX1
2.648344e-17 1.572091e-17 -5.637823e-17 8.435231e-17 -5.061530e-18 1.653266e-17 6.106187e-17 -3.024530e-20 -5.356386e-17
SST SYP KRT19 CD45 FOXP3 CD45RA CD8a CA9 IAPP
8.549202e-17 1.113822e-16 3.183247e-18 3.826115e-17 3.707431e-17 9.184282e-17 -5.701918e-17 3.716639e-18 -2.751022e-17
KI-67 NKX6-1 pH3 CD4 CD31 CDH PTPRN pRB cPARP1
2.973509e-17 6.585052e-17 -1.116108e-17 -1.119980e-16 -8.595206e-17 3.263600e-17 -1.060735e-16 -1.961115e-17 -1.344628e-16
Ir191 Ir193
7.201491e-18 -9.470510e-17
rowVars(assay(sce, "scaled"))
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
In this section, we will learn how to perform common dimensionality reduction methods using the scater package. We will start by peforming a principal component analysis. For this, we can use the runPCA function provided by the scater package.
library(scater)
sce <- runPCA(sce, exprs_values = "exprs", ncomponents = 10, subset_row = !(rownames(sce) %in% c("H3", "Ir191", "Ir193")))
reducedDims(sce)$PCA
We can also compute a TSNE using the runTSNE function. Of note: you need to set a seed for reproducibility. Also, the default perplexity value for the runTSNE function is the number of cells divided by 5. This is far larger than the default value for Rtsne, which is 30 but should preserve the overall structure better. However, it is recommended setting different seeds and different perplexity parameters to check the effect of those on the visual appearance of the TSNE.
set.seed(12345)
sce <- runTSNE(sce, exprs_values = "exprs", subset_row = !(rownames(sce) %in% c("H3", "Ir191", "Ir193")))
reducedDims(sce)
List of length 2
names(2): PCA TSNE
Finally, we can also compute a UMAP:
set.seed(12345)
sce <- runUMAP(sce, exprs_values = "exprs", subset_row = !grepl("H3|Ir", rownames(sce)))
reducedDims(sce)
List of length 3
names(3): PCA TSNE UMAP
Again, you need to play around with the seed and the n_neighbors and the min_dist parameter to see how this affects the visual appearance. In general, the non-linear dimensionality reduction methods (tSNE, UMAP) only offer a visual tool and should not be used for clustering.
The scater package offers a number of functions to visualize the SingleCellExperiment object.
The plotExpression function plots expression values stored in the SingleCellExperiment object:
plotExpression(sce, features = c("PIN", "CDH"), exprs_values = "exprs", other_fields = "ImageNumber") +
facet_wrap(~ImageNumber)
plotExpression(sce, features = c("PIN", "CDH"), x = "CellType", exprs_values = "exprs",
colour_by = "CellType") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
plotExpression(sce, features = "CDH", x = "PIN", exprs_values = "exprs",
colour_by = "CellType") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
The plotHeatmap function provides a wrapper function to pheatmap to visualize expression counts.
library(viridis)
Loading required package: viridisLite
plotHeatmap(sce, features = c("CDH", "PIN"), exprs_values = "exprs", colour_columns_by = "CellType")
plotHeatmap(sce, features = c("CDH", "PIN"), exprs_values = "exprs",
colour_columns_by = "CellType", columns = which(sce$CellType == "beta"))
The plotReducedDims function allows you to plot the different dimensionality reduced embeddings:
plotReducedDim(sce, "TSNE", colour_by = "CellType")
plotReducedDim(sce, "TSNE", colour_by = "case")
plotReducedDim(sce, "TSNE", colour_by = "PIN", by_exprs_values = "exprs")
plotReducedDim(sce, "TSNE", colour_by = "PIN", by_exprs_values = "exprs", shape_by = "case") +
scale_shape_manual(values = c(16, 16, 16))
plotReducedDim(sce, "UMAP", colour_by = "CellType")
plotReducedDim(sce, "UMAP", colour_by = "case")
plotReducedDim(sce, "UMAP", colour_by = "PIN", by_exprs_values = "exprs")
plotReducedDim(sce, "PCA", colour_by = "CellType")
plotReducedDim(sce, "PCA", colour_by = "case")
plotReducedDim(sce, "PCA", colour_by = "PIN", by_exprs_values = "exprs")
The good thing is that the returned plots are ggplot2 objects and can be further altered.
Graph-based clustering methods are preferred due to higher speed (https://osca.bioconductor.org/clustering.html)[https://osca.bioconductor.org/clustering.html]. But there are a few considerations:
We will first use scran to build a shared-nearest neighbour graph and perform community detection using the louvain method.
library(scran)
cur_graph <- scran::buildSNNGraph(sce, k = 10, assay.type = "exprs",
subset.row = !(rownames(sce) %in% c("H3", "Ir191", "Ir193")))
cur_clusters <- igraph::cluster_louvain(cur_graph)$membership
table(cur_clusters)
cur_clusters
1 2 3 4 5 6 7 8 9 10 11
669 1138 242 1291 1378 635 1871 1481 1635 158 1776
# Store clusters in sce object
sce$snn_clusters <- factor(cur_clusters)
# TSNE and umap visualization
plotReducedDim(sce, "TSNE", colour_by = "snn_clusters")
plotReducedDim(sce, "UMAP", colour_by = "snn_clusters")
Next, we will use the Rphenograph package for clustering. Rphenograph is also graph-based but weights edges between two nodes based on the jaccard similarity between the two sets of neighbors. In general, this approach produces finer clusters. When using approx=TRUE, you need to set a seed.
# Compare overlap between clusters
table(sce$snn_clusters, sce$rphenograph_clusters)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
1 630 4 0 0 0 0 0 3 2 0 2 6 0 1 0 1 0 20
2 49 0 0 0 1 0 0 2 6 0 32 805 0 0 0 6 0 237
3 2 235 0 0 0 0 0 0 2 0 0 1 0 2 0 0 0 0
4 19 0 0 0 0 0 0 0 5 2 74 3 0 20 0 21 0 1147
5 0 0 0 0 1324 0 0 0 0 0 0 0 0 48 1 0 5 0
6 0 0 78 1 0 459 0 0 0 0 0 0 20 0 0 1 76 0
7 2 0 0 0 16 0 0 18 6 18 1525 0 0 7 2 2 2 273
8 0 1 36 395 0 8 181 0 2 0 1 0 0 0 0 855 2 0
9 21 2 0 0 2 0 32 460 633 384 3 80 0 3 0 1 0 14
10 0 0 0 0 0 0 0 0 5 1 0 0 0 0 152 0 0 0
11 9 1 0 0 23 0 0 0 0 1 1 1 0 1717 0 0 12 11
Finally, I will also show the use-case of the flowSOM function for clustering. This is provided by the CATALYST package. However, the SingleCellExperiment object needs to be slightly altered to make it compatible with CATALYST. By default, flowSOM will use the assay(sce, "exprs") slot.
Finally, we can average the counts per cluster (or any other entry to colData(sce)).